Linear Programming for Finite State Multi-Armed Bandit Problems
نویسندگان
چکیده
1. iBtrodactfMu An important sequential control problem with a tractable solution is the multi-armed bandit problem. It can be stated as follows. There are N independent projects, e.g., statistical populations (see Robbins 19S2), gambling machines (or bandits) etc.. The state of the pth of them at time t is denoted by x,it) and it belongs to a set of possible states S, which in this paper is assumed to be finite. Let 5, =: { 1 , . . . , A,}. At each point in time one can work on one project only and if the i>th of them is selected, one receives a reward r(f) = r'it) and its state changes according to a stationary transition rule: py = P(x,(/+ l)=y|x^(/)= /) whfle the states of all other projects remain unchanged: x,(< -f1) = x^it) iS K¥= P. Let xit) = (x, ( f ) , . . . , x^it)) and let wit) denote the project selected at time /. The states of all projects are observable and the problem is to choose ir(/) as a function of xit), so as to maximize the expected total discounted reward, given an initial state x(0):
منابع مشابه
Tax Problems in the Undiscounted Case
The aim of this paper is to evaluate the performance of the optimal policy (the Gittins index policy) for open tax problems of the type considered byKlimov in the undiscounted limit. In this limit, the state-dependent part of the cost is linear in the state occupation numbers for the multi-armed bandit, but is quadratic for the tax problem. The discussion of the passage to the limit for the tax...
متن کاملLarge-Scale Bandit Problems and KWIK Learning
We show that parametric multi-armed bandit (MAB) problems with large state and action spaces can be algorithmically reduced to the supervised learning model known as “Knows What It Knows” or KWIK learning. We give matching impossibility results showing that the KWIKlearnability requirement cannot be replaced by weaker supervised learning assumptions. We provide such results in both the standard...
متن کاملFour proofs of Gittins' multiarmed bandit theorem
We study four proofs that the Gittins index priority rule is optimal for alternative bandit processes. These include Gittins’ original exchange argument, Weber’s prevailing charge argument, Whittle’s Lagrangian dual approach, and Bertsimas and Niño-Mora’s proof based on the achievable region approach and generalized conservation laws. We extend the achievable region proof to infinite countable ...
متن کاملComplexity Constraints in Two - Armed Bandit Problems : An Example
This paper derives the optimal strategy for a two armed bandit problem under the constraint that the strategy must be implemented by a finite automaton with an exogenously given, small number of states. The idea is to find learning rules for bandit problems that are optimal subject to the constraint that they must be simple. Our main results show that the optimal rule involves an arbitrary init...
متن کاملAn Optimal Algorithm for Linear Bandits
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order √ Td lnN on any finite class X ⊆ R of N actions, and of order d √ T (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Oper. Res.
دوره 11 شماره
صفحات -
تاریخ انتشار 1986